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Suppose we observe an invertible linear process with independent 
^sj ■ mean-zero innovations and with coefficients depending on a finite- 

dimensional parameter, and we want to estimate the expectation of 
some function under the stationary distribution of the process. The 
usual estimator would be the empirical estimator. It can be improved 
C/j ' using the fact that the innovations are centered. We construct an 

ri , even better estimator using the representation of the observations as 

infinite-order moving averages of the innovations. Then the expecta- 
tion of the function under the stationary distribution can be written 
as the expectation under the distribution of an infinite series in terms 
of the innovations, and it can be estimated by a tZ-statistic of increas- 
ing order (also called an "infinite-order [/-statistic" ) in terms of the 
estimated innovations. The estimator can be further improved using 
the fact that the innovations are centered. This improved estimator 
CO ' is optimal if the coefficients of the linear process are estimated op- 

^T ' timally. The variance reduction of our estimator over the empirical 

>D , estimator can be considerable. 

O 

^^ \ 1. Introduction. There is a large literature on estimation in ergodic time 

,^ ■ series driven by independent innovations. In the last fifteen years, optimal- 

C^ . ity questions have also been addressed. Efficient estimators for the param- 

H ! eters of ARMA-type processes are constructed by Kreiss (1987a, b), Je- 

ganathan (1995), Drost, Klaassen and Werker (1997), Koul and Schick (1997) 

and Schick and Wefelmeyer (2002a). For invertible linear time series, the in- 

r> ' novations can be estimated, and linear functionals of the innovation distri- 

5—1 ' 
C^ ■ bution can then be estimated by corresponding empirical estimators based 

on the estimated innovations; see Boldin (1982) and Kreiss (1991). Simple 
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2 A. SCHICK AND W. WEFELMEYER 

and efficient improvements of these estimators are possible if the innovations 
are centered; see Wefelmeyer (1994) and Schick and Wefelmeyer (2002b). 

Here we are interested in estimating functionals of the stationary law. 
Such functionals can be estimated in a straightforward way from observa- 
tions of the time series. Linear functionals of the stationary law can be 
estimated by corresponding empirical estimators. The stationary density 
can be estimated by a kernel estimator; see, for example, Yakowitz (1989), 
Tran (1992) and Honda (2000). 

These estimators are "nonparametric" in that they do not exploit the in- 
formation that the time series is driven by independent innovations. In this 
paper we show how to use this information in order to construct efficient es- 
timators for linear functionals of the stationary law of causal and invertible 
linear processes with coefficients depending on a finite-dimensional parame- 
ter. We restrict attention to estimation of expectations of smooth functions. 
Examples are moments, absolute moments, the characteristic function and 
other transformations of the stationary law. One of the applications would 
be testing for Gaussianity. Under stronger conditions on the time series, 
one could prove corresponding results for expectations of step functions, for 
example, the distribution function. An application would be estimating the 
value at risk in financial mathematics. 

In the simplest such time series, a moving average process of order 1, 
Saavedra and Cao (1999, 2000) show that the specific structure of the model 
allows the stationary density to be estimated at the parametric rate n~^''^. 
Schick and Wefelmeyer (2004) prove that the estimator of Saavedra and 
Cao is efficient. Analogous parametric rates can also be obtained for estima- 
tors of conditional expectations; see Miiller, Schick and Wefelmeyer (2003) 
for a result in nonlinear autoregressive processes. Such estimators could be 
combined with the estimators in the present paper in order to efficiently 
estimate functionals of joint laws of linear processes, for example, autoco- 
variance functions. 

A cautionary remark: unlike the usual empirical estimators for functionals 
of the stationary law, our efficient estimators use the full structure of the 
model, in particular, the independence of the innovations. Like all efficient 
estimators, they are therefore sensitive against misspecification of the model. 

Specifically, consider observations Yi,. . . ,Yn from a causal linear process 

oo 

Yt = Xt + Y.6sXt-s, t€Z, 

s=l 

with independent and identically distributed innovations Xt,t G Z, with 
mean and finite variance. A simple estimator of a linear functional E[h{YQ)] 
of the stationary distribution is the empirical estimator - J21=i ^O^j)- If does 
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not use the fact that the process is hnear and centered. We shall show how 
to construct better estimators if the process is invertible, 

oo 

Xt = Yt + Y.-isYt^s. tGZ. 

s=l 

The idea is to express the functional £^[/i(lo)] as E[h{XQ + J2'^i ^sXs)] and 
to estimate it by a [/-statistic of increasing order based on estimated inno- 
vations, taking into account the constraint that the innovations have mean 
0. We do this for a situation often encountered in applications: the coefh- 
cients 5i,52, ■ ■ ■ and hence also 71, 72, . . . depend on an unknown Euclidean 
parameter t?. 

The construction of our estimator involves several steps. Let us illustrate 
them with the simplest example, a linear autoregressive model of order 1, 

Yt = Wt-i+Xt, t£Z, 

with "& belonging to the interval (—1,1). Our result is new, and nontrivial, 
even for this simple case. The model is a semiparametric model with one- 
dimensional parameter •& and infinite-dimensional parameter P, the distri- 
bution of the innovations. The stationary distribution of this process thus 
depends on the pair (i?,P). 

We want to estimate the linear functional ii^[/i(yo)] of the stationary distri- 
bution. The obvious estimator is again the empirical estimator - J21=i h{Yj). 
It is known that the empirical estimator is a least dispersed regular estimator 
in Markov chain models with completely unspecified transition distribution; 
see Penev (1991), Bickel (1993) and Greenwood and Wefelmeyer (1995). 
Here, however, we are dealing with a semiparametric submodel. Thus, we 
should be able to improve upon this estimator. 

Before we describe our estimator, let us briefly describe a simple im- 
provement of the empirical estimator, obtained by exploiting the fact that 
the innovations, and hence the observations, have mean 0. This is a linear 
constraint -E[lo] = on the stationary distribution. For any c G M we obtain 
a new estimator for E[h{YQ)]: 

1 "■ 
-Y.(h{Y,)-cY,). 

For general Markov chain models, Miiller, Schick and Wefelmeyer (2001b) 
determine the constant c which minimizes the asymptotic variance of the 
new estimator. For our autoregressive model, this constant becomes partic- 
ularly simple if /i is a polynomial. For example, for the stationary variance 
E\Yq]^ that is, h{y) =y^, the optimal constant is 

= _ ^3 

" ''* (l + ^)//2' 
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r2 



with Ilk = -E-i^il- This optimal c^, depends on P and d and must be esti- 
mated. We estimate ?? by the least squares estimator i?* = ^ Si=i ^-i^/n Si=i ^-i' 
the innovations by 1^- — ^^Yj-^i and /i^ by its empirical estimator based on 
estimated innovations: 



k 



1 " 

(1.1) /i, = _^(y._^^y^_,; 

The resulting estimator for -E[Y(f] is 

-I n / 

n^,\ ' (l + ^*)/i2 ' 

This simple improvement of the empirical estimator does not use the 
autoregressive structure of the chain. As mentioned above, this structure 
is exploited by a [/-statistic of increasing order. Improving the empirical 
estimator then involves three steps. In the first step, we assume "d as known 
and exploit the structural relation Yt = '&Yt-i + Xt. In the second step, we 
use the information that the innovation distribution has mean 0. The last 
step consists of replacing -d by an estimator. 

The key step is the first one: we represent the observations as an infinite 
series of the innovations: 

oo 

Yt = Y,^'^t-s, tGZ. 

Suppose first that the parameter iD is known. Then we can calculate the 
innovations Xt = Yt — i!)Yt-i, t = 1, . . . ,n, from the observations. Since Yq 
has the same distribution as 5 = J2'^i ^^~^Xs, the problem is now reduced 
to estimating the functional 

E[h{Yo)] = E[hiS)] 



E 



oo 



h[Y^^'-'Xs 



, s=l 



from i.i.d. observations Xi , . . . , Xn- This expectation is approximated by E[h{S^'^' )] 
with S^"^' = X^s^i '&'^~^Xs if m increases with n. This suggests using the fol- 
lowing variant of a [/-statistic as an estimator for E[h{S^'^')]. Form the 
sums 

m m 

s=l s=l 

for injective functions i from {l,...,m} into {l,...,n}. These sums are 
distributed as S^"^' . Hence we estimate i?[/i(yo)] by an average over these 
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sums, the [/-statistic 

where $ denotes the set of ah injective functions from {1, . . . , m} into {1, . . . , n}. 

We can show, via Hoeffding decomposition, that if m = m(n) increases 
with n at an appropriate rate, then the [/-statistic k{'d) is asymptotically 
linear, 

1 " 

k(i?) = E[hiYo)] + - E KiXj) + Op(n-i/2), 

with influence function /i* = X^^i hs, where hs{x) = E[h{S)\Xs = x] —E[h{S)]. 
For fixed m, the [/-statistic k{'d) is a least dispersed regular estimator of 
E[h{S^"''^)] = E[h{J2'JL^^'-'^Xs)] if nothing is known about the distribution 
of the Xj. See Levit (1974), or argue via the asymptotic equivalence of the 
[/-statistic and the von Mises statistic and efficiency of the empirical dis- 
tribution function [Beran (1977)]. Optimality is preserved if we let m tend 
to oo at the appropriate rate. For [/-statistics of increasing order, see also 
Shieh (1994) and Heilig and Nolan (2001). 

In Section 2 we prove these results for functionals of the more general 

form E[h{J2'^iPsXs)] with summable coefficients I3i,f32, The results 

are of independent interest. For simplicity, we do not prove them under 
minimal assumptions on the function h. In our applications to linear time 
series in Sections 4 and 5, we shall need stronger assumptions anyway. The 
assumptions are general enough to cover moments and absolute moments 
and other smooth functions. 

Now we turn to the second step of the construction of our estimator, ex- 
ploiting the fact that Xt has mean 0. This is a linear constraint of the form 
£"[11 — "dYol = E[Xi] = 0. The simple improvement of the empirical estima- 
tor ^J21=ih{Yj), described above, has used the linear constraint E[Yi] =0 
on observations from a Markov chain. Here we use the constraint £'[^i] = 
on the observed innovations, which are i.i.d. This simplifies improving our 
estimator k^'d). Similarly, as above, we form, for any a G M, the estimator 

k{^, a) = k{^) - a- J2{Y, - Wj.i), 

which has influence function x >-^ /i*(x) — ax. It is easy to check that the 
choice 

_ E[Xih,{Xi)] 
"""*" E[Xf] 
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yields an estimator with smallest asymptotic variance in this class of esti- 
mators. The optimal a* stems from projection on [Xi]. It depends on P and 
must be replaced by an estimator. A consistent estimator is 



j:]=i(y,-w,^,) 



where 



HsA^) = T-^ E KS^m, s = l,...,m,j = l,...,n. 
in- Ij! . J-f^ . 



This leads us to the estimator 



1 " 



We show that this is a least dispersed regular estimator of E[h{YQ)] in 
the submodel with known parameter t9. For a related efficiency result in 
such i.i.d. models with linear constraints, but for simpler functionals, see 
Levit (1975). In Section 3 we generalize these results to functionals of the 
iovmE[h{ET=iPsXs)]. 

The third and last step of the construction of our estimator consists 
of replacing ■!? by an estimator i9, leading to the substitution estimator 
k('&, a*('(?)). It then follows from the substitution principle that the substitu- 
tion estimator is efficient for E[h{Yo)] = E[h{J2T=i^'~^Xs)] if ^ is efficient 
for "d. Conditions for this principle to hold were first formulated by Klaassen 
and Putter (2001) in models with independent and identically distributed 
observations, and generalized to Markov chain models by Miiller, Schick and 
Wefelmeyer (2001a). 

In Section 4, rather than checking the conditions for the substitution 
principle, we calculate directly the influence function of the substitution 
estimator for functionals E[hQ2'^ias{i})Xs)] from observations which ap- 
proximate Xi , . . . , Xn . In Section 5 we apply the results of Sections 2-4 
to estimate stationary expectations E[h{Yo)] from observations of causal 
invertible linear processes. Efficiency of our estimator follows from Schick 
and Wefelmeyer (2002a) who characterize efficient estimators for arbitrary 
differentiable functionals in such time series models. 

In Section 6 we compare the asymptotic variances of the empirical estima- 
tor, the improved empirical estimator and our estimator for the stationary 
variance in AR(1) models. In this situation the asymptotic variances of the 
estimators can be calculated explicitly. For innovation distributions far from 
normal the variance decrease can be considerable. 
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2. Estimating the distribution of an infinite series. Let Xi,X2,... be 

independent and identically distributed random variables with 

(2.1) E[\Xi\^P]<oo 

for some p>l and with unknown common distribution P. Let /3i,/32, • • • be 
known real numbers such that 

oo 

(2.2) Y.\f^r\<oo. 

r=l 

Then the series 

oo 

S = 2_^ Pr^r 
r=\ 

converges almost surely and in L2p. Let /i be a function from M to M such 
that 

(2.3) |/i(x)| <Ci(l + |x|P), xGM, 

(2.4) |Mx + y)-/i(^)|<C2(l + |xn(|y| + |2/n, rE,yGM, 

for some finite constants C\ and Ci- Then the expectation i?[/i(5)] is well 
defined. Examples of functions h that satisfy (2.3) and (2.4) are polynomials 
in X or I x| of degree at most p and Lipschitz continuous functions. 

We are interested in estimating E\h{Sy\ from the observations Xi, . . . , X^. 
Let us introduce our estimator. It follows from (2.1)-(2.4) that the infinite 
sum S is well approximated by the finite sum S^^' = J2T=i Pr^r for moder- 
ately large m. Indeed, the Minkowski inequality yields that 



<7 

(2.5) E ~ '— > 



E/^i^J 



j=a 



<i?[|Xi|«] 5^1/3,1 , l<a<b,l<q<2p. 

\j=a ) 

In view of (2.4) and the independence of 5 — 5*^™) and 5(™), 
S[|/i(5)-/i(5('"))|2] 

< ClE\(\ + |5("*) 1^)^] {e{[\S - S^""^ \ + \S- 5("*) ff] 



It is now easy to see that there exists a constant K such that 

/ oo \ 2 

(2.6) E[\hiS)-h{S^'"^^)\']<K'i Y: m] 

\r=m+l / 

and hence 

oo 

(2.7) \E[h{S)]-E[hiS^"^^)]\<K Y. \Pr\. 

r=m+l 
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Actually, the constant K can be chosen to be 

/ oo \ 2p-l 

K = 2C2[l + Y.\Pr\\ {l + E[Xl]+E[\X^\''^]). 

Recall that $ denotes the set of all injective functions from {l,...,m} 
to {1, . . . , n}. The random variables 

m 
r=l 

have the same distribution as 5*^™' . Hence an unbiased estimator oi E[h{S^'^' )] 
is given by 

[n — m)\ 






J2h{s.). 



The estimator can be written as a [/-statistic, 



n> 1 



m , 

l<i(l)<---<i(m)<n 

with symmetric kernel km defined by 

with n the set of permutations of {!,..., ?7i}. Using standard [/-statistic 
techniques [see Serfling (1980), page 178, Lemma A and page 184, Lemma B], 
we obtain 

1 " 



f<' = l^m,^-'^'rakra,\{Xj) +R, 

where 






/^^ = E[fc^ (Xi , . . . , X„)] = E[/i(5(™) )] , 

^m,i(2;) =-E'[A;m(x,X2,...,Xm)] - k^, x G M, 
and the remainder satisfies 

It is easy to check that ^[/c^(Xi, . . . ,X^)] < £;[/i2(5("'))]. Using m\l{m 
t)\ < ml' and n\/{n — r)\> {n — rY, we obtain, for n — m> m?, 

"^ 1 / 



r=2 

^2 \ 2 



' n — m 
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Note also that 

m. 

mkmAix) = ^(^[/i(5(™))|X, = x]- E[h{S^"''^)]) , X G M. 

r=l 

Now let 

hr{x) = E[h{S)\Xr =x]- E[h{S)], X G M, r = 1, 2, . . . . 

With the help of (2.4) and the Cauchy-Schwarz inequality, we verify that 

j hldP< 4:E[{h{S) - h{S - PrXr)f] 

< 4ClE[{l + \S- (3.rXr\Pf]E[{\(3rXr\ + \PrXr\Pf]. 

This and the Minkowski inequality show that there exists a constant C such 
that, for all sufficiently large m and k, m < k, 

j( j: hr) dp<c( j: m) . 

\r=m+l / \r=m+l / 

Thus the series /i* = X^r^i ^r is well defined in L2{P) and is the L2(-P)-limit 
of I] 1^1 ^r: 



Cm \ 2 



as m -^ oo. 



It follows from the Cauchy-Schwarz inequality and (2.6) that 
flmkm 1 - £ /ir ) dP< 47nE[\h{S) - /i(5(™))|^] 

\r=m+l / 

for large m. We arrive at the following result. 

Theorem 2.1. Suppose we can choose m = m{n) such that 

oo 

(2.10) mVn^O and n^/^ ^ |/3,.|^0. 

r=m+l 

Let h satisfy (2.3) and (2.4). T/ien the estimator 

(n — m)] 



■EME/?-^^ 



ig$ \r=l 



i(r) 
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is asymptotically linear for E[h{S)] with influence function h^, = X^r^i ^r'- 

1 " 

K = E[h{S)] + -J2 h*i^j) + Op{n~^^^). 

In particular, R is asymptotically normal with variance J h^ dP. 

We have phrased this and the foUowing theorems about estimators as 
asymptotic Unearity results. The reason is that asymptotic hnearity is use- 
ful for obtaining other, more familiar results about estimators: they are then 
seen to be asymptotically normal, their asymptotic variances are easily cal- 
culated and we can check whether they are regular and whether they are 
efficient in the sense of being least dispersed among regular estimators. 

Remark 2.1. Let us briefly discuss the choice of m in two special cases: 

1. Suppose that the coefficients /3i,/32, . . . decay exponentially, say 

|/?,|<C^^ J = l,2,..., 

for a finite constant C and a positive number -d, -d <1. Then the require- 
ment (2.10) is satisfied if rr& jn — > and n^'^"!?*" — > 0. The latter holds if 
log(n)/m -^ oo. If t9 < e~^", it even holds for m = log(n). 

2. Suppose Pj = for j > p. Then we can take m = p. We should point out 
that in this case h^, = hi + ■ ■ ■ + hp is a finite sum and (2.8) holds even 
though m does not go to oo. This is the classical result for fixed-degree 
[/-statistics. 

As it is very time consuming to calculate R for large ?7i, it is advantageous 
to choose m as small as possible. 

Remark 2.2. If the coefficients do not decay fast enough, we may not 
be able to satisfy (2.10). For example, if (3j = j~^~", j = 1,2,. . . , for some 
positive a, then m needs to satisfy m^ /n — > and n/vn?"" — > 0. But this is 
only possible if a > 2. 

Let us now show that k is efficient. For this it suffices to show that E[h{S)] 
is differentiable at the true P with canonical gradient equal to the influence 
function /i* of our estimator k. Since we will have to look at distributions 
near to, but different from, the true P, it will occasionally be convenient to 
express the dependence of expectations on the underlying distribution by 
writing Ep for E. Note that k{P) = Ep[h{S)] defines a functional on the set 
of all distributions with finite 2pth moments. We introduce a local model at 
the true P as follows. Let L^{P) denote the set of all measurable functions 
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g from R to M such that J gdP = and J g^ dP < oo. To each g in L^{P) 
associate a sequence gn in L^{P) such that 



(2.11) 



\gn\<n^'^ and J{gn-gfdP^O. 



A possible choice is gn = 5l[2|5'| < n^^^] - J gl[2\g\ < n^/^] dP. Let P„,g de- 
note the distribution with P-density 1 + n~^''^gn. Since < 1 + n~^''^gn 
and /(I + n~'^''^gn) dP = 1, the function 1 + n~^''^gn is indeed a probability 
density. 

Theorem 2.2. Suppose we can choose m = m{n) such that (2.10) holds. 
Let h satisfy (2.3) and (2.4). Then the functional n{P) = Ep[h[S)] is dif- 
ferentiable at P with gradient h^, = J^r^i ^r- 



n^'\K{Pn,g) - k{P)) ^ j KgdP 



Proof. Let m = m{n) satisfy (2.10). Let Gnfl = 1 and 



Gn,k = J{{l + n~^/^gn{Xr)), A; = 1,2,.. 



r=l 



Since 



n 



1/2 



{Gn,k — 1) — 2^G'„,r-l5n(-^r) — 2^fl'n(-^r) + 2_^9n{Xr){Gn,r-l — 1) 



r=l 



r=l 



and 



r=2 



E[{Gn,k - m = E[Gi,] - 1 = (1 + n-'E[gi{X^)]Y - 1 



k 



<-E[gi{X,m + n-'E[gi{X,)]f 



xfc-l 



n 



we get by an application of the Cauchy-Schwarz inequality and the inde- 
pendence of Xr and Gn r-i that 



n 



^/''E[h{S){Gn,m-l)]-E 



h{S)J2gn{Xr 



r=l 



< Y.iE[h\S)]E[gl{Xr)]E[{Gn^r^i - Ifjf - 0. 



r=2 



Since J gndP = 0, we find that 

E[h{S)gn{Xr)] = E[{E[h{S)\Xr] - E[h{S)])gn{Xr)] 

gnhr dP. 
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Thus, in view of (2.8) and (2.11), 



E 

This shows that 



hiS)J29niXr) 



r=l 



/m „ 

9n ^hrdP^ / gh^ dP. 



(2.12) n^/^E[h{S){Gn,m - 1)] ^ j qK dP. 

Note that -Bp„,,[/i(5(™))] = Ep[h{S^'^^)Gn,rn]: so that n{Pn,g) - k{P) equals 

EpnJHS) - /i(S('"))] +i?[(/i(5(")) - /i(5))G„,™] +i?[/i(5)(G„,^ - 1)]. 
The desired result now follows from (2.12) and (2.10) because 



n 



Ep^Jh{S)-h{S^"^^)]\ = 0{n'/' Y: 1/3.1 



\ r=m+l 

by the same argument that yields (2.7), and 



nV2|ii;[(/,(S)-/i(5(™)))G„,„]|=0 nV2 ^ |^^| 

\ r=m+l / 

by (2.6) and E[GlJ ^ 1. D 

Theorems 2.1 and 2.2 imply that k is least dispersed among regular esti- 
mators of Ep[h{S)] if nothing is known about P. For an appropriate version 
of the convolution theorem, see Bickel, Klaassen, Ritov and Wellner [(1998), 
page 63, Theorem 2, and page 65, Proposition 1]. 

3. Estimation with constraints. In the setting of Section 2, we can find 
better estimators for E[h{S)] = E[h{J2^i Pr^r)] if additional information 
about the distribution P is available. Suppose we know that 



(3.1) fijdP = 



for some measurable function tp from M to M such that / V'^ dP is finite and 
positive. An important case is the choice ip{x) = x. This just means that P 
has mean 0. 

Under the constraint (3.1) we can consider the estimator 



1 " 



K(a)=K-o-^V(^: 



n . _, 
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for real a and verify that it has influence function /i* — aip if tti = m{n) 
satisfies (2.10): 

1 " 
~K{a) = E[h{S)] + - Y,{K{Xj) - atPiXj)) + Op{n~^/^). 

Its asymptotic variance is minimized for the choice 

which is the coefficient of the projection of h^^ onto ip. Let us now con- 
struct an estimator of a.^ that is consistent if m = m{n) satisfies (2.10). Our 
candidate is 



where 



TU^^ix,) 



Hr,j= (^_i\i H ^('^«)' r = l,...,m,j = l,...,n. 



j£<I>,j(r)=j 

Recall that Si = X^r^i Pr^iU) for i G <1>. In view of the law of large numbers, 
we need only show that 

-. n m 1 " 

(3.2) -Y,^PiXj)Y,Hrj = -J2KiX,)ijiX,) + Opil). 

it . ^ -, it . ^ 

Given Xi, the random variable Hr^i is a [/-statistic (of degree tti — 1 in the 
variables X2, ■ ■ ■ , Xn). Thus we have, for r = 1,. . . ,m and n — m> [m — 1)^, 

E[{Hr,l-E[Hr,l\Xi]f] 

<i?[/.2(5(-))]'£'^"^-^^'^^-^'"' 



fc=l 



A: 
^2 



A; 



n — m 
From this and the Cauchy-Schwarz inequality, we get 



E 



1 



n 



Y.\Y.^Hr,, - E[Hr,,\X,]) 



j=l \r=l 



<mY,E[{Hr,i- E[Hr,i\X{\ 



r=l 



: 0{m (n — m) ). 
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Thus m /n — > implies that 

-, n / m \ 2 

(3-3) -Y^[Y^{Hr,,-E[Hr^j\X,])] =Op(l). 

^ j=i\t=1 ) 

From this and another apphcation of the Cauchy-Schwarz inequahty, we can 
now conclude that 

1 n m 1 " "^ 

It ■ -t -, lb . ^ ^ 

j=l r=l ]=1 r=l 

It is easy to check that 

m 

(3.4) Y. E[Hr,j\Xj] = m(K„, + Vi(^i))- 

r=l 

As mKm = o(n^'^), we obtain from the central limit theorem that 



1 " 

- ^ '4){Xj)mKm = Op(l). 



n . _. 

In view of this, (2.8) and (2.9), we can now conclude the desired (3.2). Let 
us summarize this in the following theorem. 

Theorem 3.1. Suppose we can choose m = m{n) such that (2.10) holds. 
Let h satisfy (2.3) and (2.4). Then the estimator 

1 " 
K(a*) = R — a^—'S^TpiXj) 
n ^ 

is asymptotically linear for k{P) = Ep[h{S)] with influence function /i* — 

1 " 
^(a,) = k{P) + -Y^[h,{Xj) - a,i;{Xj)] + Op{n~^^^). 

In particular, «;(a^,) is asymptotically normal with variance 

E[{K (XO - a.^{X,)f] =jhldP- ^-^J^^- 

It is straightforward to check that h^, — a^tp is the efficient influence 
function for estimators of Ep[h{S)] under the constraint J'ipdP = 0; see 
Levit (1975). It follows from Theorem 3.1 that «;(a*) is a least dispersed 
regular estimator of Ep[h{S)] when P is unknown except for J'ipdP = 0; 
see again the convolution theorem in Bickel, Klaassen, Ritov and Well- 
ner [(1998), pages 63 and 65]. 
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4. Estimated coefficients and perturbed observations. Let Xi,...,Xn 
be i.i.d. random variables with distribution P satisfying (2.1). We want to 
estimate the expectation E[h{J2'^i Pr^r)]- ^^ the applications to time se- 
ries we have in mind, the coefficients Pi = ai(^o))/52 = «2(^o))--- depend 
on an unknown parameter t?Oi ^-^d the random variables Xi, . . . ,Xn are 
the unobservable innovations of a time series. In this case, both the coef- 
ficients and the innovations must be estimated from the time series using 
estimators of "i^o- This will be done in Section 5. In preparation, the present 
section considers general estimators X„^i('(?), . . . , Xn^ni^) of Xi, . . . , X„. The- 
orem 4.1 shows asymptotic linearity of a [/-statistic based on observations 
^n,i(^)i • • • ,Xn,n{'&)] Theorem 4.2 treats the case with constraint j ipdP = 
0. As the underlying parameter space we take an open subset of M . We 
assume that ai, 02, . . . are continuously differentiable functions from to ffi 
such that, for some ?7 > 0, 

00 00 

(4.1) ^ |Qr(T?o)| < CO and ^ sup ||dr(T9)|| < 00, 

r=l r=lll''-''o||<'? 

where d^ denotes the gradient of a^- Note that this implies that 

00 

(4.2) Y^ sup \ar{'&)\<oo 

r=l ||i5-i?o||<r; 

for the same r] as in (4.1). We consider random variables X„^i('i9), . . . , Xn,n{'&) 
such that Xn^j['d) approximates Xj li-d is close to "dQ-. there are d-dimensional 
random vectors .^i , ^2 , • • • such that 

(4.3) supS[||ejf]<oo, 

i>i 

(4.4) maxn-i/2||e,-||=0p(l), 

n 

(4.5) sup V(X„,,(^o + ^"'/'t) - Xj - n-^/H^^jf = Op(l) 

||i||<T 

for all finite T. 



m<Tj=i 



Remark 4.1. Conditions (4.3) and (4.4) are implied by uniform inte- 

gr ability of the variables ||^i|P, ||'?2|p5 The former is obvious; the latter 

follows as 



P( max n ^/^||6|| > v] 

1 " 



<4 max Emjfimj\\>n'/^rj]], 7? > 0. 

rj l<J<n 
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Thus, if the random vectors ^1,^2,- ■ ■ are identically distributed, then (4.3) 
and (4.4) follow from -E'[||Ci|P] < co. Sufficient conditions for (4.5) are the 
asymptotic differentiability of Xnj at 1l)q in the sense that 

n 

(4.6) sup Y^i^nA^o + n~^/H) - Xn,ji^o) - n-'/h'^XnA^o)f = Opil) 
\\t\\<Tj = i 

for all finite T together with 

(4.7) -J2\\XnA^o)-^jf = oAl), 

n 

(4.8) YiXnA^o)-X,f = oAl). 

i=i 

In applications to time series, Xnji'do) is a truncated series representation 
of innovations; see (5.5). 

For "i? G and i G $, set now 

00 



r=l 



S^ii}) = J2M^)X: 



i{r) 1 



r=l 
m 



r=l 

Set S = S{^o) and S-i = Si{'&o). These are the series in Section 2. Think of 
Sn,i{'&) as an approximation of Si{i}). Next define 

^^^^^(n-m)!^ ^ee. 

n! ^—i 

Then K('i?o) is an "estimator" of E[h{S)] and defined as in Section 2, but now 
with Xi, . . . ,Xn replaced by X„^i(t?o); ■ • ■ ,Xn.n{'&o)- Let i9 be an estimator 
of ^0- In this section we calculate the influence function of k{'d). The result 
will be used in Section 5. 

Assumption H. The function h satisfies (2.3) and (2.4) and is abso- 
lutely continuous with an almost everywhere derivative h' that is almost 
surely continuous with respect to the distribution of 5 and satisfies the 
growth condition 

\h'{x)\<C3{l + \x\y, xeR, 
for some constant C3 and some q G [0,p]. 
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Examples of functions h that satisfy Assumption H are again polynomials 
in X or I x| of degree at most p and Lipschitz continuous functions. 

Theorem 4.1. Suppose assumptions (4.1)-(4.5) hold, h satisfies As- 
sumption H and we can choose m = m,{n) such that (2.10) holds with fir = 
ar(^o)- If "^ is n^'"^ -consistent for -Oq, then 

(4.9) K(i9) = /c + A;[(i9-^o) + Op(n-^/2)^ 

where 

n! ^—t 

m 

Di = XlK(^0)^i(r) +ar-(t?o)Ci(r)], ^ ^ ^• 

r=l 

Proof. For i G <^ set 

m 

Dn,i = Sn,i{-0) - Si = ^[ar{-d)Xn^i{r){-(^) - Qr (^o)-'^i(r)]- 
r=l 

Since h is absolutely continuous, we see that 

k{d) -k= ^ p^ Y. ^n,i / h'{S, + zDn,i) dz. 

The desired result can now be written as 
(n-m)! ^/ /\'(5, + zZ?„,,)d^- A^(^-^o)/i'(S.)) =Op(n-V2). 



But this is a consequence of the following statements: 

(4.10) ^^^;^(/.'(5.))^ = 0p(l), 






(4.11) (!lz!!^^||o.||^ = o,(i). 



(4.12) ^^^-^ ^(D„, - A^(^ - ^o))' = o,{n-^) 

(4.13) (^-"^)! ^ [\h'{S, + zDn,^)-h'{S,)fdz = Op{l). 



•ie<J>' 
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Of course, (4.10) holds because its left-hand side has an expectation that 
converges to that of E[h'{S)'^] by the properties of h' . Next, we have 

(m \ 2 

(4.14) 



+ 2[Y,\ar{^o)\] maxi5;[||e,"2i 



2 



by the following version of the Cauchy-Schwarz inequality: 

2 



^ar6rj <^\ar\^\ar\bf.. 

\ r / r r 



Relation (4.11) follows from (4.14) and assumptions (4.1)~(4.3). To obtain 
relation (4.12) use the formula 

jg<J) \r— 1 / \r=l / j=l 

to bound the left-hand side of (4.12) by 

(rra \ 1 " 

^|a.(^)-a.(^o)| -Ell^.-ll'll^-^of 
r=l / ^ j=l 

Cm \ ^ 1 " 

r=l ) "j=l 



/ m „i \ ^ 1 " 

+ 3 E/ ll«r(^o + ^(^-^o))-a.(^o)N-- -El^jl 



■M?- 



The desired (4.12) is now immediate in view of (4.1)-(4.5) and the n^'^- 
consistency of •&. Note that the n^' ^-consistency of i?, the continuity of d^ 
and (4.1) yield 

m .1 

E / ||d,(i?o + ^(^-^o))-ar(^o)IMz = Op(l)- 

We also have 

(4.15) Dn = max|D„ j| =Op(l). 

This is a consequence of (4.12) and the fact that 

oo 

maxn"i/2||All < E l|ar(^o)ll max n-^l'^\XA 
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+ f]\ar{'do)\ max n-^/^y^ 

— ■ l<j<n 



r=l -■ '- 



= Op{l). 

Thus it suffices to prove (4.13) with !)„ j replaced by D* ■ = Dn^-ilUDn^il < 1]. 
It follows from Assumption H that 

Zn,i= l\h'{Si + zDi;)-h'{Si)fdz<ACl{2 + \Si\fP, iGcD. 

JO 

Since Si has the same distribution as 5''™'(i9o) = Y^T=i'^t{'^o)Xj and S^"^' 
converges in L2p to 5", we see that the random variables {Zn^i :i G $,n > 1} 
are uniformly integrable. Thus (4.13) will follow if we can show that, for 
every L, 

(4.16) (!i^^ fL^{h'{S, + zDl^)-h'{S,)fdz = o,{l). 

Fix L. Define a map H from Q, the set of all probability measures on the 
Borel cr-field of M^ into [0,L] by 

H{Q) = [la {h'{x) - h'{y)f Q{dx, dy), Q G Q. 

With the aid of this map, we can write the expected value of the left-hand 
side of (4.16) as 

(n — m)\ 



Y^f H{Ql,)dz, 



where Q^ j is the distribution of the bivariate random vector 

Endow Q with the topology of weak convergence. This topology is generated 
by the Prohorov metric p. By the properties of /i', the map H is bounded 
and continuous at Qq, the distribution of (5, S)^ . Note also that H[Qq) = 0. 
Hence for e > there exists 5 > such that p{Q., Qo) < 6 implies \H{Q)\ < e. 
It thus suffices to show that 

(4.17) sup{p(Q^^„ Qo) : ^ G $, ^ e [0, 1]} ^ 0. 

For this we use the following simple property of the Prohorov metric. If X 
and Y are two bivariate random vectors with distributions Q and i?, then 
p{Q,R)<V + P{\\X-Y\\>r]) for each ?7>0. Now let Yi = {S*,S*y with 

oo 

S* = Si + 2_^ Oim+r{'&Q)Xn+r- 
r=l 

Then Yi has distribution Qo and \\Y^^i -Yi\\ < V2\Si- S*\ + |-D; J for all z G 
[0, 1] and all i G $. The desired (4.17) is now immediate. D 
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Remark 4.2. For i G $ set 

m m 

Si = ^ ar{'&Q)Xi(^,.) and Ti = ^ ar(i?o)^j(r), 

r=l r=l 

SO that Di = Si + Ti. Under the assumptions of Theorem 4. 1 , one can show 
that 

(4.18) ijn^ml ^ ^, ^^^^ ^ ^ ^ 

where ^ = E[h'{S)S] with 5 = E^^i ar(^o)^r. 

One also expects that under mild additional assumptions, 

(4.19) ^^^^ Y. ^'iSi)T-^ = V + o,(\) 

for some vector v G M . Then (4.9) simplifies to 

k(t?) = k + (/z + i/)^(^ - ^o) + Op(n-i/2). 

In the following lemma we formulate a set of sufficient conditions for (4.19) 
that is useful for the applications we have in mind. 

Lemma 4.1. Suppose Assumption H holds, m^/n— >0, the random vec- 
tors ^^1,^21 • • • fl'^e stationary with -E[||Ci|P] < cxo and 

(4.20) sup£;[||^,.-S[^r|^r-s,---,^r-i]f]^0 ass^oo. 

r>s 



Then (4.19) holds with 



E[h'iS)]Y,ar{^o)E[Ci] 

r=l 



Proof. Without loss of generality, we may assume that d = 1. Let s 
denote the integer part of 1 + log(n). Let $s denote the set of all i in $ 
such that i{q) > s and \i{q) — i{r)\ > s for all g,r = 1, . . . ,?tt, and q j^ r. Set 
^r,s = E[^r\Xr-s, ■ ■ ■ j^r-i] for r > s and 

m 
r=l 

Since m^/n — > 0, we have that n^/{n — ms)^ — > 1. This shows that the 
cardinality of $s is of the same order as that of $. Hence the cardinality of 
the complement <I*\'5s of ^s with respect to <I> is of order o{n\/{n — m)\). 
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We now use this and (4.20) to show that the left-hand side of (4.19) differs 
from 

{n-m)\ ^-^ ,,f(..rr 
D = 2^ h {Si)Ti^s 

by a term of order Op(l). Indeed, the expected value of the absolute value 
of this term is bounded by 

^-^^^^( E E[\h'{s,m\] + j2E[\h'{s,)m,s-T,)\]\ 

Now use the fact that the expected values E[h'{Si)'^] and E[T^] are uni- 
formly bounded and that ^[(T^,, - r,)2]V2 < ^oo ^ \aj{'&o)\ sup^>,(£;[||?^ - 

Cr-,s|P])^ ) to conclude that this bound tends to 0. 

It is easy to check that two summands h'{Si)Ti^s and h'{Sj)Tj^s of D are 
independent if their indices i and j satisfy |i(r) —j{r)\ > s for all r = 1, . . . , ?7i. 
This shows that the variance of D goes to 0, so that D = E[D] + Op(l). Since 
Si and Ti^s are independent for i E $s, and Si has the same distribution as 
g{m) ^ Y^m^^ ar{^o)Xr, we have 

E[D]=E[h'{S(^^)]^-^^^P^ E E^s]. 

The properties of h' imply E[h'{S'-'^^)] -^ E[h'{S)]. From (4.20) and (4.1), 
we get 



sup 



CX) 



S[T,,,]-Ear(^o)i?[ei] 



r=l 



0. 



We can now conclude that E[D] — > z^. This completes the proof. D 

Let us now turn to the constrained setting of Section 3, with ip a function 
such that / ip dP = and / ^^ dP finite and positive. For ■!? G consider 

1 " 



where 

a*(i?) 



Y.';=MXn,m)Y.7=iHrM 



(n-l") 

•ite c 
parameter. 



^-.jW = 7?r:W E HS^m, r = l,...,m,j = l,...,n. 
We now write a*('!?o) for the a* of Section 3 to stress the dependence on the 
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Theorem 4.2. Suppose the assumptions of Theorem 4.1 hold. Suppose 
also that ijj is Lipschitz with an almost everywhere derivative ip' that is 
continuous P-almost surely. If i} is n^''^ -consistent for 'd^, then 

1 " . 

1 " 



n . , 



with 



1 "■ 

If, in addition, the random vectors ^1,^^25 • • • o'^e stationary and satisfy (4.20), 
then 

(4.21) Tn = E[^p'{Xl)]E[^l]+Op{l), 
and hence k^ = k{i},a^:{'&)) equals 

- a,{^o)EW{Xi)]E[il]{^ - ^0) + Op(7i-i/2). 

Proof. Since (4.21) is easy, we prove only the first conclusion. It suffices 
to show that 

(4.22) a,(T?)=a,(??o) + Op(l), 

-| n 1 " 

(4.23) - Y.{^l^{Xn,0)) - ^{X,)) = - ^ V^'C^.O^/C^ " ^0) + o,{n-^/^). 

The latter is a special case of Theorem 4.1 with h replaced by ip and ai{'&) = 
1 and Oir{'&) = for r > 2. As -0 is Lipschitz, we obtain from (4.4), (4.5) and 
the 71"^' ^-consistency of i} that 

(4.24) *„ = max \^{Xn,0)) - V'(Xj)| = Op(l). 

l<j<n 

In view of (3.2) and (4.24), the desired statement (4.22) will follow from 

-. n m 

(4.25) - Y.{i^{Xn,0)) - ^{X,)) J2 HrA^o) = Op(l), 

j=l r=l 

-, n m 

(4.26) -Y,^|J{Xn,,mY,{HrA^)-HrJm)=Op{l). 

j=l r=l 
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It follows from (2.9), (3.3) and (3.4) that 



-in/ m 



Y.[mKra-Y.Hr,j{l^0)] =Op{l) 






It follows from (4.23) that 



1 " 



- ^(V(X„,,(^)) - i;{Xj))mKm = Op{l). 

Together with (4.24), these statements yield (4.25). Next bound the absolute 
value of the left-hand side of (4.26) by 

E ~, E E mxn,jmc3{i + \Si\ + \Dn,\y\Dn,\ 

r=l ' j=lig$,i(r)=j 

^ ^4 E ~, E E (1 + l^^-l + ^")(l + \Si\ + DnflDnA, 

r=l ' j=liG^,i{r)=j 

where Dn^i and D^ are as in the proof of Theorem 4.1 and C4 is a constant. 
An application of the Cauchy-Schwarz inequality now shows that the square 
of the left-hand side of (4.26) is bounded by m?C'^UnVn, where 

_ (n-m)! ^ 2 
^^^J_ (n-mj. ^i^^x.l + ^r.fil + lS.l+D^f". 

^•=1 j=lig<J>,i(r)=j 

It follows from (4.11) and (4.12) that n[/„ = Op(l). It follows from (4.24), 
(4.15) and g < p — 1 that V^ = Op{l). As m? /n — > 0, we obtain the de- 
sired (4.26). D 

5. Application to semiparametric linear processes. Now we apply Sec- 
tions 2-4 to real- valued causal invertible processes yj,t G Z, with infinite- 
order moving average and autoregressive representations 

00 

(5.1) Yt = Xt + Y^5s{^)Xt-s. t€Z, 

s=l 

00 

(5.2) Yt=Xt-J2jsi^)Yt-s, t£Z, 

s=l 

where the innovations {Xt,t £ Z} are i.i.d. with distribution P which has 
mean and finite variance, and the parameter 1? varies in an open subset Q 



24 A. SCHICK AND W. WEFELMEYER 

of M . We assume that 61,62,- ■ ■ and 71, 72, . . . are continuously differentiable 
functions from Q into M with the fohowing growth conditions at the true 
parameter t!) = 'dQ: for a finite constant C and positive numbers rj and a <1, 

(5.3) sup [\6rm + \Mm<Ca'", r = l,2,..., 

||i3--i?o||<»? 

(5.4) sup [hrm + \\ir{m<Ca\ r = l,2,.... 

\\'&~^o\\<ri 

Here 6r is the gradient of 6r , and jr the gradient of 7^ . 

Example 5.1. For the AR(1) process Yt = Xt + Wt-i, take 6 = (-1, 1) 
and set 71(1?) = —"& and 7s (t?) = for s > 2. The infinite-order moving aver- 
age representation holds with 6si'd) ='0^. 

Example 5.2. For the MA(1) process Yt = Xt + 'dXt-i, take @ = (-1, 1) 
and set 6i{'&) = •& and 6s{'&) = for s > 2. The infinite-order autoregressive 
representation holds with 7s(^) = (— i?)*. 

Example 5.3. For the ARMA(1, 1) process Yt - ^lYt-i = Xt- i?2^t-i, 
take = {{"&!, '&2) '■'&i,'&2 G (~1) l))*^! 7^ ^2}- The infinite-order moving av- 
erage representation holds with 6s{'&) = (^1 — i?2)^i~ 1 and the infinite-order 
autoregressive representation holds with 7^(1?) = {'&2 — ^1)^2" • 

In the following, we will occasionally write Yt{'d) for representation (5.1) 
of Yt, and Ep for expectation when P is true. We want to estimate the 
functional 

K{^,p)=Ep[h{Yi{m 

from observations Yq, . . . ,Yn. Since the true innovation distribution P has 
mean 0, we have the linear constraint J xP{dx) = 0, that is, Ep[ip{Xi)] = 
for ip{x) = X. 

Note that if we observe only Yi, . . . ,Yn, we cannot estimate the first few 
innovations so well that (4.5) holds. However, (4.5) can be achieved if we also 
observe Y_r[n),- ■ ■ ^Yq for a properly chosen sequence r{n) of integers. For 
example, r{n) =p — 1 works for AR(p). In general, we must have Assump- 
tion 3 in Schick and Wefelmeyer (2002a), which under our assumption (5.4) 
holds with r{n) proportional to (logn)^"'"'^ for some e > 0. We will assume 
in this section that those additional observations are available. Otherwise, 
renumber the observations. 

We apply Section 4 with Or = 6r-i, r = 1,2, . . . , where 60 = 1, and take 
Xn,i{'&), ■ ■ ■ ,Xn,n{i^) to be truncated versions of the representation (5.2) of 
the innovations Xi, . . . , Xn in terms of the observations: 

r(n)+j 

(5.5) Xn,j{^) = Yj+ Y. ^si^i-s, j = l,...,n, {}ee. 

s=l 
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It is easy to see that assumption (5.3) implies assumptions (4.1) and (4.2). 
Let us now show that (5.3) and (5.4) imply (4.3)-(4.5) with 

oo 

^j = E^.(^o)>S-«, J = 1,2,.... 

As (,i,S,2,--- are stationary and square integrable by (5.4), we obtain (4.3) 
and (4.4) from Remark 4.1. To prove relation (4.5), we verify the sufficient 

conditions (4.6)-(4.8) with Xn,j{^) = El=t^ is{'&)Yj-s. Conditions (4.7) 
and (4.8) are easy consequences of the choice of r{n) and assumption (5.4). 
We bound the expectation of the left-hand side of (4.6) by 

n / r{n)+j \ 2 

j=i \ s=i m\<T ) 

/ oo \ 2 

< E{Yl) J2 sup hsi^o + n-^'h) - 7,(^o) - n-^'h'' %{^o)\ ■ 



.3 = 1 



\t\\<T 



We have used the Minkowski inequality here. Since 71,72,... are continu- 
ously differentiable, each term in the last series converges to as n tends to 
00 . Hence the sequence of series converges to since the dominated conver- 
gence theorem applies by (5.4). This proves (4.6) and completes the proof 
of (4.5). Finally, assumptions (5.3) and (5.4) imply relation (4.20). 
Now set 

m 
r=l 



a,(i9) 






re* j=i 

Since the random vectors ^i,^2,--- are stationary with i?p[^i] =0, Theo- 
rem 4.2 implies 



1 " 



k{h, a*(i?)) = k{^) - a=,(??o)- Xl^i + ^v^^ 

Tl . _, 



-1/2^ 



and Theorem 4.1, Remark 4.2 and Lemma 4.1 imply 

k{^) = k + Ep[h'{Yi{^o))Yi{'do)^]{S - i?o) + Op(n-^/2), 
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with Yii-do) = E^i'Jr(^o)^i-r- By Theorem 2.1 we have 

1 " 
n 'f— ; 

We arrive at the following result. 

Theorem 5.1. Suppose assumptions (5.3) and (5.4) hold and h satisfies 
Assumption H [with Yi{'Qq) playing the role of 5(i?o)]- Choose m = m{n) 
such that m^ /n — > and \og{n)/m — > 0. If d is n^'"^ -consistent for "Oq, then 

k{^,a,{^)) = K{{)Q,P) + Ep[h'{Yi{'dQ))Yi{§Q)^]{d-do) 



1 " 

-^[KiXj) - a^{'dQ)Xj\ + 0p{n 



1/2n 



n . ^ 



Computations are faster if m is small. We may choose m proportional 
to (logn)^"*"^ with e > 0. 

Let us now show that k{'d,a^{^)) is efficient for £'p[/i(Yi(i?o))] if "& is 
efficient for tDq. Schick and Wefelmeyer (2002a) give conditions for local 
asymptotic normality and characterize efficient estimators for differentiable 
functionals in causal and invertible linear processes. We need only check that 
the functional k{'&q,P) = Ep[h{Yi{'&{)))] is differentiable in an appropriate 
sense, with efficient influence function equal to the influence function of 
^(i?,a,(i9)). 

We assume from now on that P has flnite Fisher information I{P) for loca- 
tion; that is, P has an absolutely continuous density / and I{P) = J i'^ dP < 
cxo, where (. = f'/f. We also assume that the matrix V{'&q) = Ep[^i^] is 
positive definite. 

Local asymptotic normality and differentiability require a local model. It 
is introduced in Schick and Wefelmeyer (2002a) as follows. Set 

G=lg(^L,{P):jgdP = jxg{x)P{dx) = Q\. 
For g in G define Pn^g by its P-density 1 + n~^''^gn with 

where 7(x) = (l,x) and Jnix) = (1, — n^'*^ V x A n^'^) , and 
gn= f gl[\g\ <n^^\x - n~^/^y)ipiy)dy, 



with if the standard normal density. Set 'dri,t = t?o + ^ ^ t for t G M'^. The 
arguments of Theorems 2.2 and 4.1 yield the following result. 
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Theorem 5.2. Suppose assumptions (5.3) and (5.4) hold and h satisfies 
Assumption H [with Yi{'do) playing the role of S{'do)]. Then, for each {t,g) £ 

n^/\K{^n,t,Pn,g) - k{^o,P)) ^ Ep[h'{Yi{^o))Yi{^oy]t+ f Kg dP. 



Schick and Wefelmeyer [(2002a), Section 5], construct a least dispersed 
regular estimator ^t, for ??o- It is asymptotically linear, 

1 " 

By Theorem 5.1, the substitution estimator /€(■(?*, a* (??*)) is also asymptot- 
ically linear, 

K(T?*,a*(i?*)) 

1 " 
n ■^L 

X {V{dQ)I{P))-^ijl{X,) + K{X,) - a,{^o)Xj 

By the characterization in Schick and Wefelmeyer [(2002a), Section 2], The- 
orem 5.2 shows that the efficient influence function of /«(■!?, P) equals the 
influence function of the substitution estimator k('d^,,a^{'d^)), so that the 
latter is least dispersed and regular for k{'&q,P) = Ep[h{Yi{-&o))]. 

6. Variance reduction in a special case. We illustrate our results with 
the autoregressive example considered in the Introduction. Let Yo,...,Yn 
be observations from the AR(1) model Yt = 'doYt-i + Xt with ["i^oj < 1 and 
independent and identically distributed innovations Xt with distribution P, 
density /, mean and finite fourth moment /i4, where ^k = J x P{dx), 
k = 2,3,4. We also assume that P has finite Fisher information I{P) = 
J P dP for location, where i = f /f- 

We want to estimate the stationary variance 

r / oo \ 2n 

a^ = K{^o,P) = E[Y,^]=E iY^^'oXsj 

Here h{x) = x^ . The stationary variance reduces to 

2_ /^2 



o 



X-^l 
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We consider the following estimators. The empirical estimator of a"^ is 

and has influence function 



n . , 



^ {y'-^lx^-i,^). 



X-^l 



The improved empirical estimator of a is 



-i='-t(yf-7rS^y^,- 



n 



j = is (1 + T?*)A2 



with /xfc as defined in (1.1) and i?* the least squares estimator: 



E"=i>9-iii 



The improved empirical estimator has influence function 

J- - Vq V ^2 / 

For these results we refer to Example 2 in Miiller, Schick and Wefelmeyer (2001b). 
Finally, we write (5"?t(^) = k{'& , a^,{{^)) for our estimator of a"^. Suppose 

that 'd is asymptotically linear with influence function w. Then by Theo- 
rem 5.1 our estimator is asymptotically linear with influence function 



jw{x, y) + {y- -doxf - ^2 — -{y - "dox) ) . 



The least squares estimator ??* has influence function 

1 _:(J2 

w{x,y) = -x{y - -dQx). 

/^2 

An efficient estimator ??^ has influence function 

1 _^2 
w{x,y) = —^xi{y-'dQx). 

If we use an efficient estimator 1?^, then the estimator a'\{9^) is efficient 
by Section 5. In the particular case of estimating moments, simpler effi- 
cient estimators are given in Section 6 of Schick and Wefelmeyer (2002a). In 
particular, a simpler efficient estimator of o"^ is 

A2 . 1 „* ^ As ^ 

with p,2 = il2 — ^/^i- 



l-^% h 
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The estimator is obtained by replacing fi2 and ^o in '^^ = ^2/(1 — "^^o) ^y 
efficient estimators. Tlie efficient estimator 1I2 of 112 uses tlie constraint fii = 
0. Of course, both efficient estimators for u^ are stochastically equivalent. 
This can be seen directly by simplifying (5'Ht(^#). More generally, (12/ {^ ~ '^'^) 
is stochastically equivalent to (THt(i9) for any Ji-*^' ^-consistent estimator '& of 

Next we determine the asymptotic variances of these estimators. The 
empirical estimator a^ has asymptotic variance 

1 f 2 . 2 ^0 



(1_^2)2V^- ^-2 . -^-^l_^2^ 

The improved empirical estimator a^ has asymptotic variance 

One calculates that the estimators (T?t(^*) and iJ'2/i^~'^*) have the same asymp- 
totic variance. Finally, the efficient estimators (T2t(^#) and /i2/(l ~ '^'ji) have 
asymptotic variance 

/^4 - /^2 + TTmTi :^ 



(1_^2)2V-^ ^2. J(p)(l_^2) ^J- 

The relative asymptotic variance increase of the empirical estimator cj^ over 
the efficient estimator is 

/(P)(l - 4)f^l/f^2 + Ml^i2{f^2I{P) - 1) 

/(P)(l - T?2)(^4 _ ;,2 _ ^2/^2) + 4^2^^ • 

For the improved empirical estimator o"^, the relative asymptotic variance 
increase is 

A^lfl2{^l2l{P) - 1) 

/(P)(l - T?2)(^4 -^2- ^2/^^) + 4^2^2 ■ 

These estimators are efficient for values of "i^o and P for which the cor- 
responding ratios are 0. The second ratio is if and only if "i^o = or 
^2l{P) = 1- The latter happens if and only if P is normal. Thus the im- 
proved empirical estimator al is efficient if and only if f^o = or P is normal. 
The first ratio is if and only if ^3 = and also '!?o = and 112! (P) = 1. 
Thus the empirical estimator a^ is efficient if P is the normal distribution. 
For other distributions, it is efficient if and only if i?o = and /X3 = 0. The 
two ratios are the same if and only if /is = 0, which is the case for symmetric 
P. 

If i?o is close to 1, both ratios are close to fi2l{P) — 1- Note that iJ,2l{P) — 1 
is the relative variance increase of the sample mean versus the efficient esti- 
mator in the location model generated by P. It is well known that ^2l{P) — 1 
can be large if P is not normal. 
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